Skip to content

IDE sample of "unsupported sources"->DataFrame #1231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 16, 2025

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Jun 3, 2025

Fixes #1215

Adds examples/guides in idea projects for using DataFrame +:

  • Exposed
  • Spark (with Kotlin Spark API)
  • Spark
  • Multik

Updates README and documentation:

  • Updates "main features and concepts" in readme and docs to align them and hint about other-library interoperability
  • small update about polymorphism since I was editing there anyway

@Jolanrensen Jolanrensen added examples Something related to the examples documentation Improvements or additions to documentation (not KDocs) labels Jun 3, 2025
@Jolanrensen Jolanrensen force-pushed the unsupported-data-sources-examples branch from 2b58d6c to 738cd46 Compare June 3, 2025 12:13
@Jolanrensen Jolanrensen force-pushed the unsupported-data-sources-examples branch from 738cd46 to 0e16817 Compare June 3, 2025 13:32
@Jolanrensen Jolanrensen force-pushed the unsupported-data-sources-examples branch from 80a82ae to 46128e8 Compare June 5, 2025 16:06
@Jolanrensen Jolanrensen force-pushed the unsupported-data-sources-examples branch from 408b0a5 to 05fd49e Compare June 11, 2025 14:14
@Jolanrensen Jolanrensen marked this pull request as ready for review June 13, 2025 11:41
@Jolanrensen Jolanrensen added this to the 1.0.0-Beta3 milestone Jun 13, 2025
Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great examples!
I also think we need to add FAQ or something with all these external sources examples mention.
In README and website.

* This can be useful for storing matrices for easier access later or to simply organize data read from other files.
* For example, MRI data is often stored as 3D arrays and sometimes even 4D arrays.
*/
fun main() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could claim it as a NumPy array source reader via Multik

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within limits, yes. Multik cannot read Fortran-contiguous numpy arrays for instance. Something I came across when looking for MRI data.

// created by Customers.toDataFrameSchema()
// The same can be done for the other tables
@DataSchema
data class CustomersDf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe CustomersDFSchema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be a bit verbose wouldn't it? ;P "Simply cast it to the CustomersDfSchema DataSchema and you're good to go"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I want to name schema interface/data class as "...Schema" too. Let's discuss it and make some conventions 😄!

Copy link
Collaborator Author

@Jolanrensen Jolanrensen Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noo please don't. If you have a DataFrame<Person> it's clear you use Person as the schema for the dataframe, right? Adding Schema everywhere would just make it more verbose IMO. Remember that each data schema is already annotated with @DataSchema, plus the compiler plugin makes them all extend DataRowSchema as well. So you would get a "schema"-overload:

@DataSchema
data class MySchema(val a: Int): DataRowSchema

val df: DataFrame<MySchema>

not to forget auto-generated data schemas based on JDBC/openAPI etc. using their own names. Would those break the convention?

* @see toDataFrameSchemaWithNameNormalizer
*/
@Suppress("UNCHECKED_CAST")
fun Table.toDataFrameSchema(columnNameToAccessor: MutableMap<String, String> = mutableMapOf()): DataFrameSchema {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good proposal for KDF-Exposed integration module

Copy link
Collaborator

@zaleslaw zaleslaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notice about possible renaming

@Jolanrensen Jolanrensen merged commit 2bbba19 into master Jun 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation (not KDocs) examples Something related to the examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing guide for connecting unsupported dataframe-like libraries to DataFrame
3 participants